Sampling of Common Items: an Unrecognized Source of Error in Test Equating1

نویسندگان

Michalis P. Michaelides

Edward H. Haertel

چکیده

There is variability in the estimation of an equating transformation because commonitem parameters are obtained from responses of samples of examinees. The most commonly used standard error of equating quantifies this source of sampling error, which decreases as the sample size of examinees used to derive the transformation increases. In a similar way of reasoning, the common items that are embedded in test forms are also sampled from a larger pool of items that could potentially serve as common items. Thus, there is additional error variance due to the sampling of common items. Currently, common items are treated as fixed; the conventional standard error of equating captures only the variance due to the sampling of examinees. In this study, a formula for quantifying the standard error due to the sampling of the common items is derived using the delta method and assuming that equating is carried out with the mean/sigma method. The analytic formula relies on the assumption of bivariate normality of the IRT difficulty parameter estimates. The derived standard error and a bootstrap approximation for the same quantity are calculated for a statewide assessment under both threeand one-parameter logistic IRT models; for the polytomous items, a graded response model is fitted. For the oneparameter logistic case, a small-sample bootstrap approximation to the standard error of equating due to the sampling of examinees is derived for comparison purposes. There was some discrepancy between the analytic and the bootstrap approximation of the error due to the sampling of common items. Examination of the assumption of bivariate normality of the difficulty parameter estimates showed that the assumption does not hold for the data set analyzed. For simulated data drawn from a population that was distributed as bivariate normal, the two methods for estimating the error gave nearly identical results, confirming the correctness of the analytic approximation. The comparison with the examinee-sampling standard error of equating revealed that the two sources of equating error were of about the same magnitude. In other words, the conventional standard error of the equating function reflects only about half the equating error variation. Numerical results demonstrate that for individual examinee scores the two equating errors comprised only a small proportion of the total error variance; measurement error was the largest component in individual score variability. For group-level scores though, the picture was different. Measurement error in score summaries shrinks as sample size increases. Examineesampling equating error also decreases as samples become larger. Error due to common-item sampling does not depend on the size of the examinee sample—it is 1 We would like to thank Michael Nering and Kevin Sweeney of Measured Progress Inc. for providing the data analyzed in this study and David Rogosa and Robert Tibshirani for their insightful comments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utility of Complex Alternatives in Multiple-Choice Items: The Case of All of the Above

This study investigated the utility of all of the above (AOTA) as a test option in multiple-choice items. It aimed at estimating item fit, item difficulty, item discrimination, and guess factor of such a choice. Five reading passages of the Key English Test (KET, 2010) were adapted. The test was reconstructed in 2 parallel forms: Test 1 did not include the abovementioned alternative, whereas Te...

متن کامل

Source Discrimination for Unrecognized Items? On Empirical Arguments Against the High-Threshold Model of Source Memory

Recently, Starns et al. (2008) collected source judgments for old items, which participants had claimed to be new, and found residual source discriminability depending on the old-new response bias. The finding was interpreted as evidence in favor of the multivariate signal-detection model but against the high-threshold model of source memory. According to the latter, “new” responses only follow...

متن کامل

Investigating the factors affecting whistle-blowing by employees in the hospital

Background: The occurrence of all kinds of errors and mistakes imposes many costs on the hospital and society. Whistleblowing and error reporting plays an essential role in preventing and reducing errors, but the rate of error reporting in hospitals is low. This research was conducted in order to investigate the effect of selected individual factors on whistleblowing of hospital employees. M...

متن کامل

The effects of the violation of local independence assumption on the person measures under the Rasch model

Local independence of test items is an assumption in all Item Response Theory (IRT) models. That is, the items in a test should not be related to each other. Sharing a common passage, which is prevalent in reading comprehension tests, cloze tests and C-Tests, can be a potential source of local item dependence (LID). It is argued in the literature that LID results in biased parameter estimation ...

متن کامل

Review Psychometric Parameters of the 29th Residency Test (1380) According to the Classic Test Theory (CTT)

Introduction. To select the best group, and to make a good decision, are of the most important worries of the health and medical education ministry and also all entrants in the residency test. Having and performing a reliable and good exam will reduce doubts to a great deal. Considering different scientific methods consist of (precisely review of curriculum by the designer committee, sampling o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Sampling of Common Items: an Unrecognized Source of Error in Test Equating1

نویسندگان

چکیده

منابع مشابه

Utility of Complex Alternatives in Multiple-Choice Items: The Case of All of the Above

Source Discrimination for Unrecognized Items? On Empirical Arguments Against the High-Threshold Model of Source Memory

Investigating the factors affecting whistle-blowing by employees in the hospital

The effects of the violation of local independence assumption on the person measures under the Rasch model

Review Psychometric Parameters of the 29th Residency Test (1380) According to the Classic Test Theory (CTT)

عنوان ژورنال:

اشتراک گذاری